A Resource for Natural Language Processing of Swiss German Dialects

نویسندگان

  • Nora Hollenstein
  • Noëmi Aepli
چکیده

Since there are only a few resources for Swiss German dialects, we compiled a corpus of 115,000 tokens, manually annotated with PoStags. The goal is to provide a basic data set for developing NLP applications for Swiss German. We extended the original corpus and improved its annotation consistency. Furthermore, we trained dialect-specific PoS-tagging models and implemented a baseline system for dialect identification.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Compilation of a Swiss German Dialect Corpus and its Application to PoS Tagging

Swiss German is a dialect continuum whose dialects are very different from Standard German, the official language of the German part of Switzerland. However, dealing with Swiss German in natural language processing, usually the detour through Standard German is taken. As writing in Swiss German has become more and more popular in recent years, we would like to provide data to serve as a steppin...

متن کامل

Syntactic transformations for Swiss German dialects

While most dialectological research so far focuses on phonetic and lexical phenomena, we use recent fieldwork in the domain of dialect syntax to guide the development of multidialectal natural language processing tools. In particular, we develop a set of rules that transform Standard German sentence structures into syntactically valid Swiss German sentence structures. These rules are sensitive ...

متن کامل

Continuous variation in computational morphology - the example of Swiss German

Most work in natural language processing is geared towards written, standardized language varieties. This focus is generally justified on practical grounds of data availability and socio-economical relevance, but does not always reflect the linguistic reality of sub-standard varieties. In this paper, we aim at the computational description of the morphology of a language with continuous interna...

متن کامل

Morphological analysis and lemmatization for Swiss German using weighted transducers

With written Swiss German becoming more popular in everyday use, it has become a target for text processing. The absence of a standard orthography and the variety of dialects, however, lead to a vast variation in different spellings which makes this task difficult. We built a system based on weighted transducers that recognizes over 90% of the tokens in certain texts. Weights ensure preferring ...

متن کامل

Declarative sentence intonation patterns in 8 swiss German dialects

This study examines declarative sentence intonation contours in 8 vastly different Swiss German dialects by the application of the Command-Response model. Fundamental frequency patterns of a controlled declarative sentence are analyzed on the global and local level of intonation. The results provide evidence of a different patterning for the dialects in the context of how global and local level...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2015